Back

Protein Science

Wiley

Preprints posted in the last 90 days, ranked by how well they match Protein Science's content profile, based on 221 papers previously published here. The average preprint has a 0.07% match score for this journal, so anything above that is already an above-average fit.

1
Design to Data for mutants of β-glucosidase B from Paenibacillus polymyxa: Y333F, A88E, L219Q, A408H, Y173L, E340S, and Y422F

Maduros, A.; Farinsky, L.; Tagkopoulos, P.; Vater, A.; Siegel, J. B.

2026-02-05 biochemistry 10.64898/2026.02.04.703908 medRxiv
Top 0.1%
25.9%
Show abstract

This study explores computational design predictions related to experimental enzyme behavior by analyzing seven single-point mutants of {beta}-glucosidase B (BglB) from Paenibacillus polymyxa: Y333F, A88E, L219Q, A408H, Y173L, E340S, and Y422F. Each mutation was modeled using Foldit Standalone, and mutant selections were based on predicted thermodynamic stability changes of interest. Six of the seven mutants in this set yielded soluble, expressed protein. Most variants had similar catalytic efficiency compared to the wild type with one exception. The melting temperatures for most variants were also similar to the wild type. Correlation analysis revealed weak but potentially informative relationships between predicted {Delta}TSE and (a) thermal stability and (b) catalytic efficiency. These results further support known limitations of TSE score as a tool for single point mutation design and add to a growing dataset being generated to build the next generation of functionally predictive protein models.

2
GEF me a break: the consequences of freezing Rho guanine-nucleotide exchange factor catalytic domains

Anderson, L. K.; Barpal, E.; Mendoza, H.; Cash, J. N.

2026-04-09 biochemistry 10.64898/2026.04.08.717323 medRxiv
Top 0.1%
22.5%
Show abstract

Purified proteins are routinely flash frozen for use in functional and structural studies, providing a convenient way to reproduce results across complex experiments. Rho guanine-nucleotide exchange factors (RhoGEFs) are no exception to this practice, yet the effects of freezing on their activity and stability remain largely uncharacterized. This gap potentially affects the characterization of these important enzymes and how results are interpreted with respect to their prospective use as therapeutic targets. Here, we tested the isolated DH/PH tandems of P-Rex1, P-Rex2, and PRG under different cryoprotectant conditions and monitored activity and thermostability over time after flash freezing. Our results show a clear divergence between the activity of fresh and frozen purified RhoGEF protein samples in as little as one week for some conditions. Specifically, the variability in data collected on frozen samples was greatly increased. Despite these differences, thermostability seems to be preserved for much longer timepoints across RhoGEFs. Moreover, despite eventual changes in both activity and thermostability with respect to freezing, there are no obvious changes in global conformation between fresh and frozen samples of the isolated P-Rex2 DH/PH tandem. From our data, there are few generalizable trends between the different RhoGEFs and no single cryoprotective agent tested was a silver bullet to preserve both activity and thermostability across RhoGEFs. Overall, our findings emphasize the unpredictable effects of freezing RhoGEFs. As such, RhoGEF freezing should be carefully characterized for each protein and critically viewed when comparing analyses between different studies.

3
Environmental and mutational modulation of collateral fitness effects informs their mechanisms

Goff, C.; Tsou, E.-Y.; Mehlhoff, J. D.; Ostermeier, M.

2026-01-23 evolutionary biology 10.64898/2026.01.22.699087 medRxiv
Top 0.1%
22.5%
Show abstract

Fitness effects of mutations that do not arise from changes in a proteins ability to perform its physiological functions (called collateral fitness effects or CFEs) are an understudied aspect of fitness landscapes. We have previously systematically measured the CFEs of all possible single amino acid substitutions in four proteins and found the frequency of deleterious mutations to vary by two orders of magnitude. Of these proteins, TEM-1 {beta}-lactamase had the highest frequency, and deleterious mutations caused TEM-1 aggregation. Here, we systematically measured TEM-1 collateral fitness landscapes in environments and situations expected to alter protein aggregation or protein stability. We found a moderate correlation between deleterious CFEs and predicted thermodynamic stability effects in TEM-1s -domain. Empirically, we found that the frequency and magnitude of deleterious CFEs can be reduced by altering the growth environment to disfavor aggregation (i.e. reducing the growth temperature or shifting to minimal media) or by stabilizing TEM-1 (via the M182T mutation or the addition of the {beta}-lactamase inhibitor avibactam to the growth medium). However, although raising the growth temperature to favor aggregation exacerbated deleterious CFEs of many mutations, many mutations effects were reduced. Furthermore, although reductions in CFEs occurred with reductions in TEM-1 aggregation for some mutants, for many mutants they did not. We propose that mutational destabilization exposes protein motifs that can cause deleterious CFEs, but that these motifs and those that cause aggregation are not necessarily the same motifs.

4
A stickiness scale for disordered proteins

Cao, F.; Tesei, G.; Lindorff-Larsen, K.

2026-01-27 biophysics 10.64898/2026.01.25.701651 medRxiv
Top 0.1%
22.2%
Show abstract

Disordered proteins are a heterogeneous group of proteins that play a broad range of functions in biology, and display conformational properties that range from compact globules to expanded chains. We here describe the results of a data-driven approach to derive a scale that represents the propensity of the twenty amino acids to interact with one another relative to water. The scale is based on biophysical experiments on 115 proteins and can be thought of as a stickiness (or hydropathy) scale specific for disordered proteins. We compare the scale to 70 other previously reported hydropathy scales and find that it is closer to four scales related to membrane proteins or the transition temperatures of elastin-like peptides. We envisage that the new scale will be useful in bioinformatics and machine learning approaches to quantify the role of sequence composition and patterning in disordered proteins, to understand the driving forces for their interactions with other molecules, and their evolutionary conservation.

5
Global analysis of thermal and chemical denaturation using CheMelt: Thermodynamic dissection of highly thermostable de novo designed proteins

Lampinen, V.; Burastero, O.; Guazzelli, I. P.; Vogele, F.; Pinheiro, F.; Nowak, J. S.; Garcia Alai, M. M.; Kjaergaard, M.

2026-04-09 biophysics 10.64898/2026.04.07.716910 medRxiv
Top 0.1%
21.7%
Show abstract

De novo protein design often produces thermostable proteins that denature above 100 {degrees}C, which complicates the analysis of their stability. Thermostable proteins can be unfolded by combined chemical and thermal denaturation followed by global analysis of multiple melting curves. Here, we have developed CheMelt, a new online tool for global analysis of unfolding data via an intuitive graphical user interface. We use nanoscale differential scanning fluorimetry followed by CheMelt data analysis to dissect the combined thermal and chemical denaturation of thirty-five de novo designed protein binders. Fifteen present sufficient fluorescence changes to extract thermodynamic parameters of unfolding. These de novo designed proteins have systematically lower {Delta}Cp and m-values than comparable natural proteins, which implies that they expose fewer hydrophobic residues upon unfolding. We show that a high thermostability of a designed protein does not necessarily imply a high equilibrium stability; and demonstrate the potential of CheMelt in dissecting thermodynamic properties for protein design and engineering.

6
Defining the Active Conformation of Typical Protein Kinases Domains from Substrate-Bound PDB Structures Enables Active-State AlphaFold2 Models for All 437 Human Catalytic Protein Kinases

Gizzio, J.; Faezov, B.; Xu, Q.; Dunbrack, R. L.

2026-02-19 bioinformatics 10.64898/2026.02.19.706771 medRxiv
Top 0.1%
19.1%
Show abstract

Humans have 437 catalytically competent protein kinase domains with the typical kinase fold, similar to the structure of Protein Kinase A (PKA). The active form of a kinase must satisfy requirements for binding ATP, magnesium, and substrate. From structural bioinformatics analysis of 248 crystal structures of 54 unique substrate-bound kinases, we derived structural criteria for the active form of typical protein kinases. We include well-known requirements on the DFG motif of the activation loop and the N-terminal domain salt bridge, but also on the positions of the N-terminal and C-terminal segments of the activation loop that must be placed appropriately to bind substrate. With these criteria, only 130 of the 437 human catalytic protein kinases (30%) are in the Protein Data Bank in their active form. Because the active forms of catalytic kinases are needed for understanding substrate specificity and the effects of mutations on catalytic activity in cancer and other diseases, we used AlphaFold2 to produce models of all 437 human protein kinases in the active form. This was accomplished with templates from the PDB that resemble substrate-bound structures, shallow multiple sequence alignments of orthologs and close paralogs of the query protein, and application of the active-kinase criteria to the output models. We selected models for each kinase based on intramolecular ipSAE scores of the activation loop residues of these models, demonstrating that the highest scoring models have the lowest or close to the lowest RMSD to 29 non-redundant substrate-bound structures in the PDB. A larger benchmark of 117 active kinase structures with solved activation loops in the PDB shows that 71% of the highest scoring AlphaFold2 models had backbone RMSD < 1.0 [A] to the benchmark structures and 92% were within 2.0 [A]. Models for all 437 catalytic kinases are available at https://dunbrack.fccc.edu/kincore/activemodels. We believe they may be useful for interpreting mutations leading to constitutive catalytic activity in cancer as well as for templates for modeling substrate and inhibitor binding for molecules which bind to the active state.

7
Structural basis for saccharide binding by human RNase 2/EDN, a protein combining enzymatic and lectin properties

Kang, X.; Prats-Ejarque, G.; Boix, E.; Li, J.

2026-03-23 biochemistry 10.64898/2026.03.20.713198 medRxiv
Top 0.1%
19.0%
Show abstract

Human RNase 2 (eosinophil-derived neurotoxin, EDN) is a major eosinophil granule protein of the vertebrate-specific RNase A superfamily and is involved in antiviral response and inflammation. Identifying ligand-binding pockets in EDN is thus relevant to structure-based drug design. In our laboratory we identified by protein crystallography a conserved site at the protein surface binding to carboxylic anion molecules (malonate, tartrate and citrate). Searching for potential biomolecules rich in anion groups and considering previous report of EDN binding to glycosaminoglycans, we explored the protein binding to saccharides. Next, EDN crystals were soaked with mono- and disaccharides, and the 3D structures of ten complexes were solved by X-ray crystallography at atomic resolution. We identified protein binding pockets to glucose, fucose, mannose, sucrose, galactose, trehalose, N-acetyl-D-glucosamine, N-acetylmuramic acid, and the sialic acid N-acetylneuraminic acid. A main site for glucose, fucose, and galactose was located adjacent to the spotted carboxylic anion site. Secondarily, N-acetylneuraminic acid, N-acetylmuramic acid, sucrose, galactose, and mannose shared another protein surface region. Overall, the saccharides clustered into seven defined sites, outlining a conserved recognition pattern, which was further analysed by molecular modelling. Interestingly, within the RNase A family, we find amphibian RNases that were initially isolated as carbohydrate binding proteins and named as leczymes, combining enzymatic and lectin properties. The present data is the first systematic structural characterization of a mammalian sugar-binding RNase within the family. The results highlight unique EDN residues that mediate its sugar specific interactions, of particular interest for a better understanding of the protein physiological role. HighlightsO_LIstructure of RNase 2 in complex with mono and disaccharides at atomic resolution C_LIO_LIidentification of RNase 2 unique sugar binding sites C_LIO_LIcharacterization of a mammalian RNase A family enzyme with lectin properties C_LI Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=110 SRC="FIGDIR/small/713198v1_ufig1.gif" ALT="Figure 1"> View larger version (46K): org.highwire.dtl.DTLVardef@1d805f7org.highwire.dtl.DTLVardef@16fcc49org.highwire.dtl.DTLVardef@ccfd92org.highwire.dtl.DTLVardef@1b8f1e_HPS_FORMAT_FIGEXP M_FIG C_FIG

8
Sequence-encoded differences in the conformational ensembles of CITED transcriptional activation domains impact coactivator binding

Do, T. U.; Kraft, E. J.; Chappell, G. F.; Parnham, S.; Berlow, R. B.

2026-01-21 biophysics 10.64898/2026.01.20.700670 medRxiv
Top 0.1%
18.6%
Show abstract

Recent advances in predicting and modeling conformational ensembles of intrinsically disordered proteins (IDPs) have provided much needed insights into sequence-ensemble relationships. It is thought that conservation of physicochemical properties, but not the exact identity or order of the amino acids, maintains IDP ensemble properties that are crucial for function. However, detailed experimental studies are still required to fully understand the relationships between sequence and function in IDPs. The human CITED proteins, which are fully disordered transcriptional regulators, share conserved C-terminal transactivation domains (CTADs) that interact with the TAZ1 domain of the transcriptional coactivators CBP/p300. The conserved CTADs harbor amino acid substitutions in regions that are known to be important for interactions of CITED2 with TAZ1, but the effects of these substitutions on TAZ1 binding for the other CITED proteins are unknown. Here, we use solution NMR spectroscopy, circular dichroism, and surface plasmon resonance to characterize the conformational ensembles, dynamics, and interactions of the CITED CTADs. The CTADs are disordered in isolation, although the CITED2 CTAD uniquely displays residual helical structure that is sensitive to ionic strength and protein concentration. In contrast, the CITED1 and CITED4 CTADs remain largely disordered and exhibit more uniform dynamics. Quantitative binding measurements reveal differences in thermodynamics and kinetics for the CTADs interactions with TAZ1, with CITED2 binding most tightly and CITED4 exhibiting significantly weaker affinity. Our results highlight the sensitivity of IDP conformational ensembles to minor sequence changes and the impacts that changes in IDP structures and dynamics can have on biological functions.

9
Analysis and design of disordered polypeptides with optimized sequence patterning properties

Singh, A.; Ukperaj, A. I.; Dignon, G. L.

2026-02-20 biophysics 10.64898/2026.02.20.707115 medRxiv
Top 0.1%
17.0%
Show abstract

Intrinsically disordered proteins (IDPs) exhibit phase separation behavior that is closely linked to their degree of single-chain compaction, which in turn is governed by both amino acid composition and sequence patterning. Existing metrics such as sequence charge decoration (SCD) and sequence hydropathy decoration (SHD) describe these effects but are largely limited to describing differences between sequences of similar length and overall composition. In this work, we present a shuffle-based normalization scheme for SCD and SHD, enabling comparison of sequence patterning between very different IDP sequences. Leveraging this normalization scheme toward design space, we develop a Monte Carlo, based sequence design algorithm that generates novel IDPs with desired patterning features. Our design framework is further strengthened by incorporating additional metrics such as sequence aromatic decoration (SAD), compositional RMSD, and a previously developed sequence based {Delta}G predictor. We validate our approach through coarse-grained MD simulations, showing that the designed sequences exhibit tunable phase behavior. This strategy lays the groundwork for rational design of IDPs for biomedical and biotechnology applications, as well as basic biophysical research. Author summaryIntrinsically disordered proteins behave similar to polymers in solution, having no defined structure. Their behavior is dictated by the collection of shapes the protein adopts, known as its "conformational ensemble" which is tuned by its amino acid sequence, and the solution environment. In this work, we have developed parameters to describe the patterning of charged and hydrophobic amino acids within these protein sequences, which are predictive of their ability to phase separate and form dense liquid-like droplets in solution. Importantly, the parameters we develop are motivated by physics and can be applied across a large number of amino acid sequences rapidly. This will enable researchers to rapidly predict the behavior of large libraries of protein sequences. We have additionally developed a software to design randomized amino acid sequences with desired amino acid composition, and patterning properties. Finally, we have tested our design scheme and parameters by running simulations of designed IDP sequences and quantified each of their ability to phase separate.

10
Solvent Isotope Effect on the Stability of a Heterodimeric Protein

Bhattacharjee, R.; Udgaonkar, J. B.

2026-02-14 biophysics 10.64898/2026.02.12.705670 medRxiv
Top 0.1%
14.5%
Show abstract

Protein stability arises from a fine balance between stabilizing forces such as hydrophobic interactions, hydrogen bonding, and ionic interactions, and destabilizing contributions from solvent exposure and electrostatics. Although hydrophobic burial is the dominant driving force for folding, intra-chain hydrogen bonds and ionic interactions modulate stability in context-dependent ways, with effects that vary depending on their location and environment within the protein. Most studies of protein stability have focused on perturbations induced by pH, solvent composition, or mutations in protonated water, leaving the influence of solvent isotopes relatively underexplored. Notably, despite stronger hydrogen bonding in D2O, proteins exhibit diverse stability responses upon transfer from H2O to D2O, suggesting that differential hydration of nonpolar groups plays a key role. Here, the solvent isotope effect on protein stability is investigated using double-chain monellin (dcMN), a {beta}-sheet-rich, two-chain protein with well-characterized folding behavior. By combining conventional equilibrium unfolding measurements with hydrogen-deuterium exchange mass spectrometry (HDX-MS), the stability of wild-type and a less hydrophobic mutant (C42A) dcMN was compared in H2O and D2O, revealing greater stabilization of the wild-type protein in D2O and highlighting the importance of hydrophobic interactions in governing isotope-dependent stability.

11
Towards crystal structures of filament forming proteins

Roske, Y.; Leidert, M.; Rehbein, K.; Diehl, A.

2026-02-22 biochemistry 10.64898/2026.02.22.707290 medRxiv
Top 0.1%
14.4%
Show abstract

Filament-forming proteins such as TasA (Bacillus subtilis) and camelysins CalY1, CalY2 (Bacillus cereus) pose a particular challenge for structural analysis due to their strong tendency to self-association and their polydispersity, which severely limits their ability to crystallize or to be a target for NMR-spectroscopy. To address this, it is necessary to modify the amino acid sequence to prevent filamentation. Engineering a series of N- and C-terminal truncated variants by removing flexible parts is often key to success. N-terminal extensions are also a powerful tool for obtaining crystals of fiber-forming proteins.

12
Fine-tuning STEAP1 protein expression and purification to preserve its conformation and function

Yao, X.; He, L.; Yoo, S.; Sun, H.; Pathakota, V.; Kaur, M.; Li, P.; Alba, B.

2026-02-18 biochemistry 10.64898/2026.02.16.706263 medRxiv
Top 0.1%
14.3%
Show abstract

Six-transmembrane Epithelial Antigen of the Prostate 1 (STEAP1) has emerged as a promising therapeutic target for prostate cancer. We have optimized the expression and purification conditions of human STEAP1 to maximize the production of its homotrimeric form, which is crucial for metal ion reduction and maintaining cellular redox balance. Proteins obtained from these optimized conditions were complexed with both heme and flavin-adenine dinucleotide (FAD), two cofactors that are fundamental to STEAP functionality, suggesting native folding and interactions of the protein. In addition, we compared the impact of stable and transient expression systems on the protein quality of STEAP1. We found that stable expression promoted heme incorporation, improved expression homogeneity, and ensured correct protein orientation on cell surfaces. Our findings present effective strategies for optimizing the recombinant production of STEAP1, with potential applicability to other STEAP family proteins to facilitate therapeutic discovery.

13
Structure of human aldehyde oxidase under tris(2-carboxyethyl)phosphine-reducing conditions

Videira, C.; Esmaeeli, M.; Leimkuhler, S.; Romao, M. J.; Mota, C.

2026-03-25 biochemistry 10.64898/2026.03.25.713928 medRxiv
Top 0.1%
12.3%
Show abstract

The importance of human aldehyde oxidase (hAOX1) has increased over the last decades due to its involvement in drug metabolism. Inhibition studies concerning hAOX1 are extensive and a common reducing agent, dithiothreitol (DTT), was recently found to inactivate the enzyme. However, in previous crystallographic studies of hAOX1, DTT was found to be essential for crystallization. To surpass this concern another reducing agent used in crystallization trials. Using tris(2-carboxyethyl)phosphine (TCEP), a sulphur-free reducing agent, it was possible to obtain well-ordered crystals from hAOX1 wild type and variant, hAOX1_6A, which diffracted beyond 2.3 [A]. Instead of the typical star-shaped crystals of hAOX1, at pH 4.7, plates are obtained in the orthorhombic space group (P22121) with two molecules in the asymmetric unit. Activity assays with the enzyme incubated with both reducing agents show that contrary to DTT, TCEP does not lead to irreversible inactivation of the enzyme. The replacement of DTT with TCEP in crystallization of hAOX1 provides a strategy to circumvent enzyme inactivation during crystallographic studies, allowing future applications of new assays, such as time-resolved crystallography.

14
High-pH NMR to Identify Macromolecular Hydrogen-Bonds and Foldons

Alexandrescu, A.; Rua, A. J.; Shah, S.; Farirchild, D.; Bezsonova, I.

2026-03-03 biophysics 10.64898/2026.02.28.708709 medRxiv
Top 0.1%
12.2%
Show abstract

Hydrogen bond (H-bond) restraints are critical for NMR structure determination, yet their experimental identification can be challenging for marginally stable structures that afford insufficient protection from (H/D) exchange in D2O. As an alternative, we explored the use of NMR between pH 10 and 11 conditions that promote rapid exchange, for identifying backbone amide protons involved in H-bonds. We analyzed [~]750 amide sites distributed across ten proteins with known structures. Persistence of amide protons at high pH in standard 2D 1H-15N HSQC spectra for 15N-labeled proteins in H2O, or TOCSY for unlabeled proteins, identifies H-bonds with [~]91% accuracy that exceeds the [~]80% accuracy of traditional H/D exchange experiments in D2O. For two -helical coiled coils and three globular proteins, we performed alkaline unfolding experiments taking advantage of amide NMR signal attenuation from unstructured polypeptides. Increasing the sample pH led to a progressive loss of native amide proton NMR signals, revealing an unfolding hierarchy where "foldons" remaining at the highest pH values had the most persistent H-bonds under EX1 exchange conditions. The foldons observed at high pH are consistent with partially folded structures previously characterized near neutral pH by native state hydrogen exchange, equilibrium unfolding, and protein fragment studies. For {beta}-sheet proteins, foldons correspond to regions with high inter-residue contact density, whereas in coiled coils they demarcate regions with high -helical propensity. High-pH NMR experiments provide a sensitive, fast, inexpensive, and broadly applicable approach to map H-bonding in marginally stable or partially folded proteins. Additionally, they offer the opportunity to explore uncharted protein dynamics and unfolding pathways under basic pH conditions.

15
Nearest Neighbour Interactions between Amino Acid Residues in Short Peptides and Coil Libraries

Schweitzer-Stenner, R.

2026-01-22 biophysics 10.64898/2026.01.19.700493 medRxiv
Top 0.1%
12.2%
Show abstract

Intrinsically disordered proteins (IDP) or proteins with intrinsically disordered regions (IDR) perform a plethora of functions mostly in a cellular environment. As unfolded proteins, IDPs can adopt molten globule or coil ensembles of conformations. Regarding the latter the question arises whether they are describable as a self-avoiding random coil. Locally, this requires that amino acid residues sample the entire sterically allowed region of the Ramachandran plot with very similar probabilities and independent on the conformational dynamics of their neighbours. However, various lines of experimental and bioinformatic evidence suggest a more restricted, side chain and nearest neighbor dependent conformational space for individual residues. Over the last 25 years short peptides and coil libraries were employed to determine conformational propensities of amino acid residues in unfolded states. The question arises whether conformational ensembles obtained from these two sources are comparable. In this paper, a variety of metrics were used to compare Ramachandran plots of a limited number of GXYG peptides (X,Y: guest residues) with XY dimers in the coil library of Ting et al.(PLOS 6, e1000763, 2010). The results reveal major differences between corresponding plots, which might in part due to the fact that solely the influence of one of the two neighbours of a given residue is probed by the above coil library while averages were taken over the respective opposite neighbours. The presented results suggest that coil libraries alone might not be a sufficient tool for determining the characteristics of statistical coils of IDPS and IDRs alike.

16
Do AI Models for Protein Structure Prediction Get Electrostatics Right?

Makhatadze, G. I.

2026-03-13 biophysics 10.64898/2026.03.11.711144 medRxiv
Top 0.1%
12.1%
Show abstract

A variant of the U1A protein containing four substitutions to ionizable residues was generated serendipitously due to a miscommunication. Biophysical measurements show that this variant has at least twice as much helical structure as the wild-type U1A and is trimeric in solution, in contrast to the monomeric wild type. In sharp contrast, structures predicted by deep-learning AI tools (AlphaFold2 and RoseTTAFold2) and transformer-based tools (OmegaFold and ESMFold) are all highly similar to the wild-type U1A (backbone RMSD < 1 [A]). Even more surprising, two of the substituted ionizable residues are predicted to be fully buried in the non-polar core of the protein, an outcome that contradicts well-established physico-chemical principles, as ionizable residues are normally located on the protein surface. To explore this effect further, we generated sequences containing up to all twelve residues that make up the non-polar core of U1A. Across thousands of sequences, and depending on the AI model used, the majority of predicted structures contained fully buried ionizable residues while still maintaining the overall U1A fold. We then examined two additional proteins of comparable size, acylphosphatase and the de novo-designed TOP7 fold, and observed the same phenomenon: AI models frequently predicted structures with buried ionizable residues that nevertheless retained the parent fold. When these AI-predicted structures were subjected to short (50 ns) molecular dynamics simulations using physics-based force fields such as CHARMM or AMBER, the structures rapidly relaxed into ensembles that exposed ionizable residues. We conclude that while AI-based structure prediction tools perform extremely well on naturally occurring sequences, they do not reliably encode the physico-chemical principles governing the placement of ionizable residues. A straightforward remedy is to include a brief molecular dynamics simulation as a final validation step for AI-generated structures.

17
IDPForge: Deep Learning of Proteins with Global and Local Regions of Disorder

De Castro, S.; Zhang, O.; Liu, Z. H.; Forman-Kay, J. D.; Head-Gordon, T.

2026-03-27 biophysics 10.64898/2026.03.25.714313 medRxiv
Top 0.1%
10.4%
Show abstract

Although machine learning has transformed protein structure prediction of folded protein ground states with remarkable accuracy, intrinsically disordered proteins and regions (IDPs/IDRs) are defined by diverse and dynamical structural ensembles that are predicted with low confidence by algorithms such as AlphaFold and RoseTTAFold. We present a new machine learning method, IDPForge (Intrinsically Disordered Protein, FOlded and disordered Region GEnerator), that exploits a transformer protein language diffusion model to create all-atom IDP ensembles and IDR disordered ensembles that maintains the folded domains. IDPForge does not require sequence-specific training, back transformations from coarse-grained representations, nor ensemble reweighting, as in general the created IDP/IDR conformational ensembles show good agreement with solution experimental data, and options for biasing with experimental restraints are provided if desired. We envision that IDPForge with these diverse capabilities will facilitate integrative and structural studies for proteins that contain intrinsic disorder, and is available as an open source resource for general use.

18
A conserved isoleucine gates the diffusion of small ligands to the active site of NiFe CO-dehydrogenase

Opdam, L.; Meneghello, M.; Guendon, C.; Chargelegue, J.; Fasano, A.; Jacq-Bailly, A.; Leger, C.; Fourmond, V.

2026-03-21 biochemistry 10.64898/2026.03.19.713016 medRxiv
Top 0.1%
10.2%
Show abstract

CO dehydrogenases (CODH) are metalloenzymes that reversibly oxidize CO to CO2, at a buried NiFe4S4 active site. The substrates, CO and CO2, need therefore to be transported through the protein matrix to reach the active site. The most likely pathway for intra-protein diffusion is the hydrophobic channel identified in the crystal structures. Here, we use site-directed mutagenesis to study the highly conserved isoleucine 563 of Thermococcus sp. AM4 CODH2. Mutations at this position change the biochemical properties (KM for CO, product inhibition constant, catalytic bias...), and increase the resistance of the enzyme to the inhibitor O2, showing that isoleucine 563 indeed lines the gas channel. The I563F mutation decreases the bimolecular rate constant of inhibition by O2 15-fold, and increases the IC50 20-fold, which is the strongest improvement in O2 resistance reported so far. We show that the size of the introduced amino acids is less important than their flexibility - along with the size of the cavity formed near the active site in the channel. We also conclude that O2 access to the active site cannot be slowed down without also affecting CO diffusion. This tradeoff will have to be considered in further attempts to use site-directed mutagenesis to make CODHs more O2 tolerant.

19
Density-guided AlphaFold3 uncovers unmodelled conformations in β2-microglobulin

Maddipatla, S. A.; Vedula, S.; Bronstein, A. M.; Marx, A.

2026-03-02 bioinformatics 10.64898/2026.02.27.708490 medRxiv
Top 0.1%
10.2%
Show abstract

Although X-ray crystallography captures the ensemble of conformations present within the crystal lattice, models typically depict only the most dominant conformation, obscuring the existence of alternative states. Applying the electron density-guided AlphaFold3 approach to {beta}2-Microglobulin highlights how ensembles of alternate backbone conformations can be systematically modeled directly from crystallographic maps. This study also highlights how the detection of conformational ensembles is affected by the local quality of electron density and subtle variations in crystallization conditions and lattice packing. These results demonstrate that density-guided AlphaFold3 can uncover conformational heterogeneity missed by conventional refinement, offering a robust, systematic framework to capture the full structural landscape of proteins in crystals and enhancing the interpretive power of macromolecular crystallography. SynopsisElectron-density-guided AlphaFold3 reveals previously unmodeled conformational heterogeneity in {beta}2-Microglobulin and shows how crystal packing influences ensemble detection in X-ray crystallography.

20
CROWN: Curated Repository Of Well-resolved Noncovalent interactions

Poelmans, R.; Van Eynde, W.; Bruncsics, B.; Bruncsics, B.; Arany, A.; Moreau, Y.; Voet, A. R.

2026-04-01 bioinformatics 10.64898/2026.03.30.714168 medRxiv
Top 0.1%
10.1%
Show abstract

AbstractThe development of machine learning models for protein-ligand interactions is fundamentally constrained by the quality and diversity of available structural data. Existing databases of protein-ligand complexes present researchers with an unsatisfying trade-off: carefully curated collections such as PDBBind and HiQBind offer high structural reliability but cover only a narrow slice of the Protein Data Bank (PDB), while large-scale resources like PLInder provide broad coverage at the expense of rigorous quality control. Here, we introduce CROWN (Curated Repository Of Well-resolved Non-covalent interactions), a machine learning-ready dataset that reconciles this tension by applying a comprehensive, fully automated preprocessing pipeline to the PLInder database. Starting from 649,915 protein-ligand interaction systems, CROWN applies a series of interleaved quality filters and processing stages addressing crystallographic resolution, ligand identity, pocket completeness, structural repair, interaction quality, and protonation at physiological pH. A distinguishing feature of the pipeline is a final constrained energy minimisation step using custom flat-bottomed restraints, which balances crystallographic evidence with relaxation of intramolecular strain. This step -- absent from existing protein-ligand datasets -- produces structurally uniform complexes by reconciling the heterogeneous refinement practices of different crystallographers and structure determination protocols, without distorting the experimentally observed binding geometry. The resulting dataset of 153,005 complexes represents a roughly four-fold increase in protein and species diversity over PDBBind and HiQBind, while maintaining rigorous structural standards. Importantly, CROWN adopts a geometry-centric design philosophy that treats the 3D arrangement of atoms at the binding interface as a self-consistent source of information, rather than relying on externally measured binding affinities that cover only a fraction of known structures and introduce well-documented biases. We anticipate that CROWN will serve as a broadly useful resource for training generative models of protein-ligand binding poses, developing scoring functions, and benchmarking interaction prediction methods.